41 research outputs found

    In papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA-Seq experimental design

    Get PDF
    In the past 5 years, RNA-Seq has become a powerful tool in transcriptome analysis even though computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. It is, however, now commonly accepted that the choice of a normalization procedure is an important step in such a process, for example in differential gene expression analysis. The present article highlights the similarities between three normalization methods: TMM from edgeR R package, RLE from DESeq2 R package, and MRN. Both TMM and DESeq2 are widely used for differential gene expression analysis. This paper introduces properties that show when these three methods will give exactly the same results. These properties are proven mathematically and illustrated by performing in silico calculations on a given RNA-Seq data set

    Manifold embedding for curve registration

    Get PDF
    We focus on the problem of finding a good representative of a sample of random curves warped from a common pattern f. We first prove that such a problem can be moved onto a manifold framework. Then, we propose an estimation of the common pattern f based on an approximated geodesic distance on a suitable manifold. We then compare the proposed method to more classical methods

    Semi-parametric estimation of shifts

    Full text link
    We observe a large number of functions differing from each other only by a translation parameter. While the main pattern is unknown, we propose to estimate the shift parameters using MM-estimators. Fourier transform enables to transform this statistical problem into a semi-parametric framework. We study the convergence of the estimator and provide its asymptotic behavior. Moreover, we use the method in the applied case of velocity curve forecasting.Comment: Published in at http://dx.doi.org/10.1214/07-EJS026 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Statistical properties of the quantile normalization method for density curve alignment

    Get PDF
    International audienceThe article investigates the large sample properties of the quantile normalization method by Bolstad et al. (2003) [4] which has become one of the most popular methods to align density curves in microarray data analysis. We prove consistency of this method which is viewed as a particular case of the structural expectation procedure for curve alignment, which corresponds to a notion of barycenter of measures in the Wasserstein space. Moreover, we show that, this method fails in some case of mixtures, and we propose a new methodology to cope with this issue

    Statistical Properties of the Quantile Normalization Method for Density Curve Alignment

    Get PDF
    We present a proof for the quantile normalization method proposed by \citet{Bolstad-03} which has become one of the most popular methods to align density curves in microarray data analysis. We prove consistency of this method which is viewed as an application to density curve registration of the new method proposed in \citet{Dupuy-Loubes-Maza-11}, the structural expectation. Moreover, when this method fails in some case of mixture, we propose a new methodology to cope with this issue

    Non parametric estimation of the structural expectation of a stochastic increasing function

    Get PDF
    This article introduces a non parametric warping model for functional data. When the outcome of an experiment is a sample of curves, data can be seen as realizations of a stochastic process, which takes into account the small variations between the different observed curves. The aim of this work is to define a mean pattern which represents the main behaviour of the set of all the realizations. So we define the structural expectation of the underlying stochastic function. Then we provide empirical estimators of this structural expectation and of each individual warping function. Consistency and asymptotic normality for such estimators are proved

    Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes

    Get PDF
    In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MR N) gives the lower number of false discoveries. Within this group the MR N method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MR N method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MR N is more consistent and robust than existing methods

    Prediction of sunflower grain oil concentration as a function ofvariety, crop management and environment using statistical models

    Get PDF
    Sunflower (Helianthus annuus L.) raises as a competitive oilseed crop in the current environmentallyfriendly context. To help targeting adequate management strategies, we explored statistical models astools to understand and predict sunflower oil concentration. A trials database was built upon experi-ments carried out on a total of 61 varieties over the 2000–2011 period, grown in different locations inFrance under contrasting management conditions (nitrogen fertilization, water regime, plant density).25 literature-based predictors of seed oil concentration were used to build 3 statistical models (multiplelinear regression, generalized additive model (GAM), regression tree (RT)) and compared to the refer-ence simple one of Pereyra-Irujo and Aguirrezábal (2007) based on 3 variables. Performance of modelswas assessed by means of statistical indicators, including root mean squared error of prediction (RMSEP)and model efficiency (EF). GAM-based model performed best (RMSEP = 1.95%; EF = 0.71) while the simplemodel led to poor results in our database (RMSEP = 3.33%; EF = 0.09). We computed hierarchical contribu-tion of predictors in each model by means of R2and concluded to the leading determination of potentialoil concentration (OC), followed by post-flowering canopy functioning indicators (LAD2 and MRUE2),plant nitrogen and water status and high temperatures effect. Diagnosis of error in the 4 statistical mod-els and their domains of applicability are discussed. An improved statistical model (GAM-based) wasproposed for sunflower oil prediction on a large panel of genotypes grown in contrasting environments

    TomExpress, a unified tomato RNA-Seq platform for visualization of expression data, clustering and correlation networks

    Get PDF
    The TomExpress platform was developed to provide the tomato research community with a browser and integrated web tools for public RNA-Seq data visualization and data mining. To avoid major biases that can result from the use of different mapping and statistical processing methods, RNA-Seq raw sequence data available in public databases were mapped de novo on a unique tomato reference genome sequence and post-processed using the same pipeline with accurate parameters. Following the calculation of the number of counts per gene in each RNA-Seq sample, a communal global normalization method was applied to all expression values. This unifies the whole set of expression data and makes them comparable. A database was designed where each expression value is associated with corresponding experimental annotations. Sample details were manually curated to be easily understandable by biologists. To make the data easily searchable, a user-friendly web interface was developed that provides versatile data mining web tools via on-the-fly generation of output graphics, such as expression bar plots, comprehensive in planta representations and heatmaps of hierarchically clustered expression data. In addition, it allows for the identification of co-expressed genes and the visualization of correlation networks of co-regulated gene groups. TomExpress provides one of the most complete free resources of publicly available tomato RNA-Seq data, and allows for the immediate interrogation of transcriptional programs that regulate vegetative and reproductive development in tomato under diverse conditions. The design of the pipeline developed in this project enables easy updating of the database with newly published RNA-Seq data, thereby allowing for continuous enrichment of the resource
    corecore